Goto

Collaborating Authors

 information game


History Filtering in Imperfect Information Games: Algorithms and Complexity

Neural Information Processing Systems

Historically applied exclusively to perfect information games, depth-limited search with value functions has been key to recent advances in AI for imperfect information games. Most prominent approaches with strong theoretical guarantees require - a process in which a subgame is computed from public information and player beliefs. However, subgame decomposition can itself require non-trivial computations, and its tractability depends on the existence of efficient algorithms for either full enumeration or generation of the histories that form the root of the subgame. Despite this, no formal analysis of the tractability of such computations has been established in prior work, and application domains have often consisted of games, such as poker, for which enumeration is trivial on modern hardware.Applying these ideas to more complex domains requires understanding their cost. In this work, we introduce and analyze the computational aspects and tractability of filtering histories for subgame decomposition. We show that constructing a single history from the root of the subgame is generally intractable, and then provide a necessary and sufficient condition for efficient enumeration. We also introduce a novel Markov Chain Monte Carlo-based generation algorithm for trick-taking card games - a domain where enumeration is often prohibitively expensive. Our experiments demonstrate its improved scalability in the trick-taking card game .These contributions clarify when and how depth-limited search via subgame decomposition can be an effective tool for sequential decision-making in imperfect information settings.




Code World Models for General Game Playing

Lehrach, Wolfgang, Hennes, Daniel, Lazaro-Gredilla, Miguel, Lou, Xinghua, Wendelken, Carter, Li, Zun, Dedieu, Antoine, Grau-Moya, Jordi, Lanctot, Marc, Iscen, Atil, Schultz, John, Chiam, Marcus, Gemp, Ian, Zielinski, Piotr, Singh, Satinder, Murphy, Kevin P.

arXiv.org Artificial Intelligence

Large Language Models (LLMs) reasoning abilities are increasingly being applied to classical board and card games, but the dominant approach -- involving prompting for direct move generation -- has significant drawbacks. It relies on the model's implicit fragile pattern-matching capabilities, leading to frequent illegal moves and strategically shallow play. Here we introduce an alternative approach: We use the LLM to translate natural language rules and game trajectories into a formal, executable world model represented as Python code. This generated model -- comprising functions for state transition, legal move enumeration, and termination checks -- serves as a verifiable simulation engine for high-performance planning algorithms like Monte Carlo tree search (MCTS). In addition, we prompt the LLM to generate heuristic value functions (to make MCTS more efficient), and inference functions (to estimate hidden states in imperfect information games). Our method offers three distinct advantages compared to directly using the LLM as a policy: (1) Verifiability: The generated CWM serves as a formal specification of the game's rules, allowing planners to algorithmically enumerate valid actions and avoid illegal moves, contingent on the correctness of the synthesized model; (2) Strategic Depth: We combine LLM semantic understanding with the deep search power of classical planners; and (3) Generalization: We direct the LLM to focus on the meta-task of data-to-code translation, enabling it to adapt to new games more easily. We evaluate our agent on 10 different games, of which 4 are novel and created for this paper. 5 of the games are fully observed (perfect information), and 5 are partially observed (imperfect information). We find that our method outperforms or matches Gemini 2.5 Pro in 9 out of the 10 considered games.


Forget Tomb Raider and Uncharted, there's a new generation of games about archaeology – sort of

The Guardian

The game I'm most looking forward to right now is Big Walk, the latest title from House House, creators of the brilliant Untitled Goose Game. A cooperative multiplayer adventure where players are let loose to explore an open world, I'm interested to see what emergent gameplay comes out of it. Could Big Walk allow for a kind of community archaeology with friends? When games use environmental storytelling in their design – from the positioning of objects to audio recordings or graffiti – they invite players to role play as archaeologists. Game designer Ben Esposito infamously joked back in 2016 that environmental storytelling is the "art of placing skulls near a toilet" – which might have been a jab at the tropes of games like the Fallout series, but his quip demonstrates how archaeological gaming narratives can be.


PolicyEvol-Agent: Evolving Policy via Environment Perception and Self-Awareness with Theory of Mind

Yu, Yajie, Feng, Yue

arXiv.org Artificial Intelligence

Multi-agents has exhibited significant intelligence in real-word simulations with Large language models (LLMs) due to the capabilities of social cognition and knowledge retrieval. However, existing research on agents equipped with effective cognition chains including reasoning, planning, decision-making and reflecting remains limited, especially in the dynamically interactive scenarios. In addition, unlike human, prompt-based responses face challenges in psychological state perception and empirical calibration during uncertain gaming process, which can inevitably lead to cognition bias. In light of above, we introduce PolicyEvol-Agent, a comprehensive LLM-empowered framework characterized by systematically acquiring intentions of others and adaptively optimizing irrational strategies for continual enhancement. Specifically, PolicyEvol-Agent first obtains reflective expertise patterns and then integrates a range of cognitive operations with Theory of Mind alongside internal and external perspectives. Simulation results, outperforming RL-based models and agent-based methods, demonstrate the superiority of PolicyEvol-Agent for final gaming victory. Moreover, the policy evolution mechanism reveals the effectiveness of dynamic guideline adjustments in both automatic and human evaluation.


Research Vision: Multi-Agent Path Planning for Cops And Robbers Via Reactive Synthesis

Fishell, William, Rodriguez, Andoni, Santolucito, Mark

arXiv.org Artificial Intelligence

Reactive synthesis is classically modeled as a game, though often applied to domains such as arbiter circuits and communication protocols [1]. We aim to show how reactive synthesis can be applied to a literal game - cops and robbers - to generate strategies for agents in the game. We propose a game that requires the coordination of multiple agents in a space of datatypes and operations that are richer than is easily captured by the traditional Linear Temporal Logic (LTL) approach of synthesis over Boolean streams [2]. In particular, we draw inspiration from prior work on Coordination Synthesis [3], LTL moduluo theories (LTLt) [4], and Temporal Stream Logic Moduluo theories (TSL-MT) [5, 6] to describe our problem and potential solution spaces. The traditional game [7] asks whether K cops can catch a single robber on a graph. In a temporal logic setting, this amounts to a safety condition on the robbers (they are never caught by the cops), and the dual liveness condition for the cops (they eventually catch the robbers). We modify the traditional graph theory focused version of the game to have a more visual game on a grid system, allowing for various configurations, including: An environment with various node types such as walls, safe zones, and open spaces.

  Country:
  Genre: Research Report (0.84)
  Industry: Leisure & Entertainment > Games (0.99)

From Natural Language to Extensive-Form Game Representations

Deng, Shilong, Wang, Yongzhao, Savani, Rahul

arXiv.org Artificial Intelligence

We introduce a framework for translating game descriptions in natural language into extensive-form representations in game theory, leveraging Large Language Models (LLMs) and in-context learning. Given the varying levels of strategic complexity in games, such as perfect versus imperfect information, directly applying in-context learning would be insufficient. To address this, we introduce a two-stage framework with specialized modules to enhance in-context learning, enabling it to divide and conquer the problem effectively. In the first stage, we tackle the challenge of imperfect information by developing a module that identifies information sets along and the corresponding partial tree structure. With this information, the second stage leverages in-context learning alongside a self-debugging module to produce a complete extensive-form game tree represented using pygambit, the Python API of a recognized game-theoretic analysis tool called Gambit. Using this python representation enables the automation of tasks such as computing Nash equilibria directly from natural language descriptions. We evaluate the performance of the full framework, as well as its individual components, using various LLMs on games with different levels of strategic complexity. Our experimental results show that the framework significantly outperforms baseline models in generating accurate extensive-form games, with each module playing a critical role in its success.


Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search

Li, Jiamian

arXiv.org Artificial Intelligence

Reinforcement learning has achieved remarkable success in perfect information games such as Go and Atari, enabling agents to compete at the highest levels against human players. However, research in reinforcement learning for imperfect information games has been relatively limited due to the more complex game structures and randomness. Traditional methods face challenges in training and improving performance in imperfect information games due to issues like inaccurate Q value estimation and reward sparsity. In this paper, we focus on Uno, an imperfect information game, and aim to address these problems by reducing Q value overestimation and reshaping reward function. We propose a novel algorithm that utilizes Monte Carlo Tree Search to average the value estimations in Q function. Even though we choose Double Deep Q Learning as the foundational framework in this paper, our method can be generalized and used in any algorithm which needs Q value estimation, such as the Actor-Critic. Additionally, we employ Monte Carlo Tree Search to reshape the reward structure in the game environment. We compare our algorithm with several traditional methods applied to games such as Double Deep Q Learning, Deep Monte Carlo and Neural Fictitious Self Play, and the experiments demonstrate that our algorithm consistently outperforms these approaches, especially as the number of players in Uno increases, indicating a higher level of difficulty.


Tree Search for Simultaneous Move Games via Equilibrium Approximation

Yu, Ryan, Olshevsky, Alex, Chin, Peter

arXiv.org Artificial Intelligence

Neural network supported tree-search has shown strong results in a variety of perfect information multi-agent tasks. However, the performance of these methods on partial information games has generally been below competing approaches. Here we study the class of simultaneous-move games, which are a subclass of partial information games which are most similar to perfect information games: both agents know the game state with the exception of the opponent's move, which is revealed only after each agent makes its own move. Simultaneous move games include popular benchmarks such as Google Research Football and Starcraft. In this study we answer the question: can we take tree search algorithms trained through self-play from perfect information settings and adapt them to simultaneous move games without significant loss of performance? We answer this question by deriving a practical method that attempts to approximate a coarse correlated equilibrium as a subroutine within a tree search. Our algorithm works on cooperative, competitive, and mixed tasks. Our results are better than the current best MARL algorithms on a wide range of accepted baseline environments.